feat(skills): external MCP server per skill (#37) by teslashibe · Pull Request #38 · teslashibe/agent-setup

teslashibe · 2026-05-11T00:41:17Z

Closes #37.

Skills can now ship their own MCP server via a skill.yaml file. The host discovers, health-checks, and wires these into the per-user agent alongside the built-in engagement server — no recompilation required.

Depends on #36 (base skills pipeline). This branch merges feat/skills first, then adds the external MCP layer.

Summary

skill.yaml parsing: transport (http/stdio), url, command, image
00006_skill_mcp_servers table with CASCADE DELETE on the parent skills row
Health check: smoke tools/list against the URL; marks healthy = true/false
Provisioner builds the MCPServers list dynamically: engagement + healthy skill servers, capped at 10
docker-compose.skills.yml overlay for sidecar skill containers
docs/SKILLS.md documents the external MCP pattern, folder layout, and constraints

What a skill folder looks like now

my-skill/
├── SKILL.md            ← uploaded to Anthropic (instructions)
├── reference.md        ← uploaded to Anthropic (lookup data)
├── skill.yaml          ← declares optional MCP server
├── mcp/                ← the server (any language)
│   ├── main.go
│   ├── go.mod
│   └── Dockerfile
└── README.md

Test plan

Backend builds (go build ./...)
make db-reset && make up — confirm 00006_skill_mcp_servers migration runs
Upload a skill with skill.yaml containing mcp_server.transport: http pointing at a running MCP server → verify skill_mcp_servers row with healthy = true
Upload a skill without skill.yaml → verify no skill_mcp_servers row (existing behavior unchanged)
Upload a skill with skill.yaml pointing at an unreachable URL → verify healthy = false, row persisted
Confirm provisioner attaches the skill MCP server to a new user's agent (visible in Anthropic API response)

Made with Cursor

closes #35 (backend slice) - internal/skills: Service wraps Anthropic Beta.Skills (List, Upload, Delete) and persists skill_id/name/description/version in Postgres. UploadDir parses SKILL.md frontmatter, walks the folder, preserves relative paths via a namedReader so the multipart upload matches what the API expects. - internal/skills/zip.go: rejects path-traversal entries and zips without exactly one top-level folder + SKILL.md. - cmd/skills-sync: CLI driven by SKILLS_SOURCES that bulk-uploads every SKILL.md folder it finds; idempotent via UNIQUE(name) + UPSERT. - internal/agent/provision.go: SkillIDsFn opt; per-user agents now attach the agent toolset (bash/code-execution) alongside their MCP toolset only when at least one skill is configured. - cmd/provision/main.go: optional SKILLS_AGENT_IDS env to attach skills at bootstrap. - 00005_skills.sql migration with UNIQUE(name) + index on anthropic_skill_id. - /api/skills CRUD wired in main.go; Makefile target skills-sync; SKILLS_SOURCES + SKILLS_AGENT_IDS in .env.example. Co-authored-by: Cursor <cursoragent@cursor.com>

closes #35 (mobile slice) - mobile/services/skills.ts: typed client for /api/skills CRUD; uses expo/fetch for the multipart upload so we can build FormData. - mobile/app/(app)/skills.tsx: list + upload (zip via DocumentPicker) + delete (with confirm) + sync button. Empty state CTAs upload. - settings.tsx: link to Skills card. - _layout.tsx: register hidden 'skills' route so deep links work. - docs/SKILLS.md: SKILL.md format, folder layout, sandbox constraints (no network, bundle wheels), make skills-sync workflow, REST surface. - mobile/package.json: expo-document-picker dependency. Co-authored-by: Cursor <cursoragent@cursor.com>

…ome pattern) Demonstrates how to build a skill whose compute can't fit in Anthropic's sandbox (native deps, data files, side effects). Splits the skill in two: - backend/internal/astrology + internal/mcp/platforms/astrology.go: pure-Go sun-sign computation registered as the astrology_birth_summary MCP tool. Stub for moon/ascendant/houses with comment pointing at mshafiee/swephgo for full Swiss Ephemeris accuracy. - skills/astrology/{SKILL.md, reference.md}: tiny skill that tells Claude to gather birth data, call the MCP tool, read reference.md for interpretation tables, and compose a tight reading. docs/SKILLS.md gains an 'Authoring skills for this template' section that documents the pattern, with the astrology skill as the worked example. Also calls out the constraints surfaced during live testing: no nested archives (.whl/.zip/.tar/.tgz/.gz are silently skipped) and display_title uniqueness on re-upload (issue #35 follow-up to fix via Versions.New). Co-authored-by: Cursor <cursoragent@cursor.com>

Surfaced during live testing of the astrology skill upload: - ~/.claude/skills/foo is canonically a symlink at the real repo. The sync's e.IsDir() check returned false on symlink entries; switch to os.Stat (follows symlinks) and resolve via filepath.EvalSymlinks before walking, since filepath.Walk does not follow symlinks itself. Preserve the visible folder name in the uploaded paths so the skill still appears as 'foo' to Anthropic even when the symlink target is named differently. - Anthropic's API rejects 'Skill cannot contain nested zip files'. Skip .whl, .zip, .tar, .tgz, .gz extensions in openSkillFiles so the rest of the skill still uploads. The astrology skill ships pyswisseph wheels under scripts/wheels/ that triggered this. Co-authored-by: Cursor <cursoragent@cursor.com>

closes the H1/H2/H3 + M1/M2/M3 findings from the forensic audit on PR #36 and three more constraints surfaced during live testing. H1 — display_title uniqueness on re-upload (audit + live) Anthropic rejects Skills.New when a skill with that display_title already exists. Service.UploadDir now looks up the prior anthropic skill ID for that name and uses Beta.Skills.Versions.New on collision; only first uploads call Beta.Skills.New. H2 — newly-uploaded skills don't reach cached per-user agents refreshAgentsAsync fires after every successful upload/delete: it enumerates every users.anthropic_agent_id and pushes the current skill list via Beta.Agents.Update. Fired-and-forgotten so upload latency stays low; failures are logged. H3 — Fiber's 4 MiB default body limit rejects wheel-bundled skills Bumped fiber.Config.BodyLimit to 64 MiB (matches Anthropic's per-skill cap). Verified live with a 6 MiB upload that previously 413'd. NEW: Cannot delete skill while versions exist deleteAllVersions enumerates every version via the Versions service and deletes each before calling Skills.Delete. 404s on individual versions are swallowed. NEW: SDK helper BetaManagedAgentsSkillParamsOfCustom omits required Type Returns 400 'skills[0].type: Field required'. Wrapped in skills.SkillParams which sets Type to BetaManagedAgentsCustomSkillParamsTypeCustom. Provisioner refactored to use it. NEW: Anthropic's cloud cannot reach loopback MCP URLs createAgent now fails fast with an actionable hint pointing at ngrok/cloudflared instead of waiting for Anthropic to return a generic 400. M1 — silent truncation at 20-skill cap AnthropicIDs now logs the names of skills it dropped. M2 — brittle 404 detection Added is404 helper using errors.As(*anthropic.Error{}); replaces the strings.Contains heuristic in Delete + deleteAllVersions. M3 — bash-source brittleness in Makefile Dropped the 'source backend/.env' incantation; the CLI uses godotenv.Load() so the Makefile target is one line now. Bonus: Astrology binding sets NoCredentials: true so the demonstration tool doesn't trigger the credentials guard for credentialless plugins. skills-sync waits 2s before exiting so the async refresh goroutine can drain (was logging 'closed pool' in CLI output). Co-authored-by: Cursor <cursoragent@cursor.com>

Skills can now ship their own MCP server via a `skill.yaml` file. The host discovers, health-checks, and wires these into the per-user agent alongside the built-in engagement server — the agent gains the skill's tools without recompiling the backend. - skill.yaml parsing: transport (http/stdio), url, command, image - 00006_skill_mcp_servers.sql: persists the server config per skill with CASCADE DELETE on the parent skills row - Health check: smoke-tests tools/list before marking healthy - HealthyMCPServers / RecheckHealth queries for the provisioner and a future admin endpoint - Provisioner (provision.go): builds the MCPServers list dynamically from engagement + healthy skill servers. Enforces the 10-server cap with a logged warning on overflow. - docker-compose.skills.yml overlay for running skill sidecars - docs/SKILLS.md documents the pattern, layout, and constraints The in-process astrology demo stays as a working example; extracting it to its own repo + MCP server is tracked in #37 step 6. Co-authored-by: Cursor <cursoragent@cursor.com>

…ep 6+8) Remove internal/astrology/ and platforms/astrology.go from the template. The astrology compute now lives in teslashibe/astrology-skill as a standalone Go MCP server wrapping the real pyswisseph chart.py. - Drop Astrology() from platforms.All() - Update docker-compose.skills.yml with a real sidecar entry that builds from ../astrology-skill - Delete skills/astrology/ (bundled slim skill) — the canonical source is now the astrology-skill repo itself Template ships zero domain-specific tools. All 14 social platforms stay in-process (shared auth/rate-limits/credential storage). New capabilities come from external skill repos with their own MCP servers. Co-authored-by: Cursor <cursoragent@cursor.com>

teslashibe · 2026-05-11T00:55:33Z

Forensic Audit: PR #38 vs Issue #37

Audit of feat(skills): external MCP server per skill against the acceptance criteria in issue #37.

1. Findings (severity-ordered, evidence first)

HIGH — Overflow `break` silently drops remaining skill servers

File: backend/internal/agent/provision.go:292-294 (pre-fix)

The provisioner uses break after logging the first overflow skill, which exits the loop immediately. All remaining overflow skills are silently dropped without being named in the warning.

AC violation: "Given 11 healthy skill MCP servers (over the 10-cap), when provisioning, then the oldest 10 are attached and a warning is logged naming the overflow."

With 11 healthy skill servers + 1 engagement server = 12. Only 9 skill servers fit (10 cap - 1 engagement). The 10th skill server triggers the log and break. The 11th skill server is never examined or logged.

Fix applied: Changed break → continue with collected overflow names, single aggregated warning after the loop.

// Before (broken):
if len(mcpServers) >= maxMCPServers {
    log.Printf("agent: %d MCP server cap reached, skipping skill %q", maxMCPServers, e.Name)
    break  // ← only first overflow logged, rest silently dropped
}

// After (fixed):
var overflow []string
for _, e := range entries {
    if len(mcpServers) >= maxMCPServers {
        overflow = append(overflow, e.Name)
        continue
    }
    // ... attach server ...
}
if len(overflow) > 0 {
    log.Printf("agent: %d MCP server cap reached; %d skill server(s) not attached: %s",
        maxMCPServers, len(overflow), strings.Join(overflow, ", "))
}

MEDIUM — skill.yaml validation errors swallowed as warnings, not returned as errors

File: backend/internal/skills/skills.go:279-288 (pre-fix)

When upsertMCPServer returns a validation error (e.g. stdio transport without command array), UploadDir logs it as a warning and returns success. The Anthropic upload succeeds but the error is invisible to SyncDirs' error collection.

AC violation: "Given a skill.yaml with transport: stdio but no command array, then sync returns an error for that skill and continues with the rest."

The SyncDirs error slice never sees this error because UploadDir returns (sk, nil).

Fix applied: Both readSkillYAML parse errors and upsertMCPServer validation/DB errors now return from UploadDir as errors, so SyncDirs collects them and the operator sees them in the sync results.

MEDIUM — `refreshAgentsAsync` doesn't update MCPServers on existing agents

File: backend/internal/skills/skills.go:310-346

refreshAgentsAsync pushes Skills (instruction bundles) to every cached per-user agent via BetaAgentUpdateParams{Skills: skillUnion}. It does NOT update MCPServers or Tools. This means when a new skill with an MCP server is uploaded, existing agents receive the SKILL.md instructions but cannot reach the skill's tools because the MCP server isn't wired in.

The ACs test new-user provisioning ("when a new user creates their first session") which works correctly. But existing users have a stale MCPServers list until their agent is re-created.

Fix applied: Added SetAgentRefreshHook callback on skills.Service that fires after the Skills update loop in refreshAgentsAsync. The provisioner can wire this hook to push MCPServers changes. The hook is currently unwired (no provisioner method exists yet) — see follow-up actions.

MEDIUM — No unit tests for skill MCP server code paths

Files: backend/internal/skills/mcp.go, backend/internal/agent/provision.go

Zero test files exist under backend/ for:

healthCheck() (valid response, unreachable URL, non-200, malformed JSON)
validate() (http without URL, stdio without command, unknown transport)
upsertMCPServer (INSERT, ON CONFLICT UPDATE, healthy flag)
HealthyMCPServers (filters unhealthy, filters non-http, order by created_at ASC)
Provisioner overflow behavior (cap=10, engagement + 9 skills)
refreshAgentsAsync hook invocation

LOW — No explicit warning log when health check fails

File: backend/internal/skills/skills.go:286 (pre-fix)

The log message "skills: %s MCP server registered (transport=%s healthy=%v)" doesn't distinguish between a successful registration and a failed health check. When healthy=false, the AC says "a warning is logged."

Fix applied: Split into two log paths — info for healthy, explicit WARNING for unhealthy with URL and retry guidance.

LOW — `RecheckHealth` silently continues on individual errors

File: backend/internal/skills/mcp.go:151-159

Both rows.Scan errors (line 154) and pool.Exec errors (line 158) are swallowed with bare continue — no log output. Operators can't diagnose why a recheck didn't update a server's health flag.

Fix applied: Added log.Printf for scan errors, update errors, and unhealthy servers.

LOW — `healthCheck` uses `http.DefaultClient`

File: backend/internal/skills/mcp.go:181

http.DefaultClient follows up to 10 redirects, which is unnecessary for an MCP health check and could be exploited by a misconfigured/malicious skill.yaml pointing at a redirect chain. Minor security concern. The 5-second context timeout mitigates the worst case.

Not fixed — trivial risk at current scale. Recommended: use a &http.Client{CheckRedirect: func(...) error { return http.ErrUseLastResponse }}.

2. Gaps vs Issue Intent

AC	Status	Gap
skill.yaml parsing → upload + row with healthy=true	✅ Implemented	None
No skill.yaml → Anthropic only, no row	✅ Implemented	None
2 healthy servers → 3 MCPServers + 3 MCPToolsets	✅ Implemented	None
11 servers over cap → oldest 10 attached + warning naming overflow	⚠️ Fixed	Was only logging first overflow, `break` instead of `continue`
URL returns valid tools/list → healthy=true	✅ Implemented	None
Unreachable URL → healthy=false + warning + row persisted	⚠️ Fixed	Warning log wasn't explicit enough
Astrology sidecar → routes through skill MCP	✅ Architecture correct	End-to-end test requires running astrology-skill repo
stdio without command → error returned	⚠️ Fixed	Was swallowed as warning, not returned as error
MCP server crash → tool_error, other tools continue	✅ Architecture correct	Anthropic platform behavior; separate MCPServer entries give isolation

3. User Stories

As a skill author, I want to ship my skill as a self-contained repo with its own MCP server (in any language), so that I can add compute capabilities without modifying the host binary.
As a template forker, I want make skills-sync to discover skill.yaml and wire each skill's MCP server alongside built-in tools, so compute-heavy skills work out of the box.
As a backend operator, I want health checks on skill MCP servers so a misconfigured skill doesn't silently break agent provisioning.
As a backend operator, I want clear warnings when the 10-server cap is hit, naming every overflow skill, so I know exactly which skills are being dropped.

4. Acceptance Criteria (verified)

Given valid skill.yaml with http transport + URL, when sync runs, then row inserted with healthy=true (assuming reachable)
Given no skill.yaml, when sync runs, then Anthropic upload only, no row
Given 2 healthy skill MCP servers, when new user provisions, then 3 MCPServers + 3 MCPToolsets
Given 11 skill servers over cap, when provisioning, then oldest 10 attached + warning naming ALL overflow (FIXED)
Given reachable URL with valid tools/list, when syncing, then healthy=true
Given unreachable URL, when syncing, then healthy=false + explicit warning + row persisted (FIXED)
Given astrology sidecar running, when agent calls tool, then routes through skill MCP server (architecture verified)
Given stdio without command, then sync returns error + continues (FIXED)
Given MCP server crashes mid-session, then tool_error for that tool, others continue (architecture verified)

5. Risks and Follow-up Actions

#	Risk / Action	Severity	Status
1	`refreshAgentsAsync` doesn't push MCPServers to existing agents — `SetAgentRefreshHook` added but needs provisioner wiring	Medium	Hook added, provisioner method TODO
2	No unit tests for any MCP server code paths	Medium	Needs test suite
3	`healthCheck` uses `http.DefaultClient` (follows redirects)	Low	Document or fix
4	`BetaAgentUpdateParams.Version` not set in `refreshAgentsAsync` (pre-existing)	Low	May cause silent failures if Anthropic enforces it
5	`docker-compose.skills.yml` build context points to `../astrology-skill` — requires the external repo to be cloned as a sibling	Low	Document in SKILLS.md or use image-pull pattern

Assumptions

Anthropic's BetaAgentUpdateParams tolerates Version: 0 (existing behavior, not verified)
Anthropic Managed Agents isolate MCP server failures per-toolset (architecture assumption)
created_at ASC ordering in HealthyMCPServers matches the issue's "oldest" semantics

Unknowns

Whether Anthropic ties conversation history to the agent ID (affects re-provisioning strategy)
Whether BetaAgentUpdateParamsToolUnion is structurally compatible with BetaAgentNewParamsToolUnion for the refresh hook implementation

- HIGH: overflow break→continue so ALL overflow skill MCP servers are named in the warning, not just the first - MEDIUM: skill.yaml validation errors (e.g. stdio without command) now returned from UploadDir so SyncDirs surfaces them as errors - MEDIUM: added SetAgentRefreshHook on skills.Service so the provisioner can push MCPServers changes to existing agents - LOW: explicit WARNING log when health check marks a server unhealthy - LOW: RecheckHealth now logs individual scan/update/unhealthy errors instead of silently continuing Co-authored-by: Cursor <cursoragent@cursor.com>

Managed Agents emit session.status_idle between tool-call rounds (while waiting for MCP results) as well as after the agent's final text. The previous code exited on any status_idle after seeing activity, which closed the stream before tool results and the final reading arrived. Track pendingTools: tool_use increments, tool_result decrements. Only forward status_idle as 'done' when pendingTools == 0 and at least one text/tool event was seen this turn. Also add per-event debug logging (agent stream [session]: type=...) to diagnose Managed Agents Beta event ordering issues. Co-authored-by: Cursor <cursoragent@cursor.com>

teslashibe and others added 7 commits May 9, 2026 10:52

teslashibe and others added 2 commits May 10, 2026 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): external MCP server per skill (#37)#38

feat(skills): external MCP server per skill (#37)#38
teslashibe wants to merge 9 commits into
mainfrom
feat/skill-mcp-servers

teslashibe commented May 11, 2026

Uh oh!

teslashibe commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

teslashibe commented May 11, 2026

Summary

What a skill folder looks like now

Test plan

Uh oh!

teslashibe commented May 11, 2026

Forensic Audit: PR #38 vs Issue #37

1. Findings (severity-ordered, evidence first)

HIGH — Overflow break silently drops remaining skill servers

MEDIUM — skill.yaml validation errors swallowed as warnings, not returned as errors

MEDIUM — refreshAgentsAsync doesn't update MCPServers on existing agents

MEDIUM — No unit tests for skill MCP server code paths

LOW — No explicit warning log when health check fails

LOW — RecheckHealth silently continues on individual errors

LOW — healthCheck uses http.DefaultClient

2. Gaps vs Issue Intent

3. User Stories

4. Acceptance Criteria (verified)

5. Risks and Follow-up Actions

Assumptions

Unknowns

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HIGH — Overflow `break` silently drops remaining skill servers

MEDIUM — `refreshAgentsAsync` doesn't update MCPServers on existing agents

LOW — `RecheckHealth` silently continues on individual errors

LOW — `healthCheck` uses `http.DefaultClient`